Corpora for conceptualisation and zoning of scientific papers

نویسندگان

  • Maria Liakata
  • Simone Teufel
  • Advaith Siddharthan
  • Colin Batchelor
چکیده

We present two complementary annotation schemes for sentence based annotation of full scientific papers, CoreSC and AZ-II, which have been applied to primary research articles in chemistry. The AZ scheme is based on the rhetorical structure of a scientific paper and follows the knowledge claims made by the authors. It has been shown to be reliably annotated by independent human coders and has proven useful for various information access tasks. AZ-II is its extended version, which has been successfully applied to chemistry. The CoreSC scheme takes a different view of scientific papers, treating them as the humanly readable representations of scientific investigations. It therefore seeks to retrieve the structure of the investigation from the paper as generic high-level Core Scientific Concepts (CoreSC). CoreSCs have been annotated by 16 chemistry experts over a total of 265 full papers in physical chemistry and biochemistry. We describe the differences and similarities between the two schemes in detail and present the two corpora produced using each scheme. There are 36 shared papers in the corpora, which allows us to quantitatively compare aspects of the annotation schemes. We show the correlation between the two schemes, their strengths and weaknesses and discuss the benefits of combining a rhetorical based analysis of the papers with a content-based one.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zones of conceptualisation in scientific papers: a window to negative and speculative statements

In view of the increasing need to facilitate processing the content of scientific papers, we present an annotation scheme for annotating full papers with zones of conceptualisation, reflecting the information structure and knowledge types which constitute a scientific investigation. The latter are the Core Scientific Concepts (CoreSCs) and include Hypothesis, Motivation, Goal, Object, Backgroun...

متن کامل

Accurate Argumentative Zoning with Maximum Entropy models

We present a maximum entropy classifier that significantly improves the accuracy of Argumentative Zoning in scientific literature. We examine the features used to achieve this result and experiment with Argumentative Zoning as a sequence tagging task, decoded with Viterbi using up to four previous classification decisions. The result is a 23% F-score increase on the Computational Linguistics co...

متن کامل

A Comparative Analysis of Metadiscourse Markers in the Result and Discussion Sections of Literature and Engineering Research Papers

This study compares metadiscourse markers in result and discussion sections of literature and engineering research papers. To this end, 40 research articles (20 literature and 20 engineering) are selected from two major international journals. Based on Hyland’s (2005) model of metadiscourse, the articles are codified in terms of frequency, percentage, and density of interactive and interactiona...

متن کامل

Negation and Speculation in Natural Language Processing ( NeSp - NLP 2010 )

In view of the increasing need to facilitate processing the content of scientific papers, we present an annotation scheme for annotating full papers with zones of conceptualisation, reflecting the information structure and knowledge types which constitute a scientific investigation. The latter are the Core Scientific Concepts (CoreSCs) and include Hypothesis, Motivation, Goal, Object, Backgroun...

متن کامل

Rhetorical Move Detection in English Abstracts: Multi-label Sentence Classifiers and their Annotated Corpora

The relevance of automatically identifying rhetorical moves in scientific texts has been widely acknowledged in the literature. This study focuses on abstracts of standard research papers written in English and aims to tackle a fundamental limitation of current machine-learning classifiers: they are mono-labeled, that is, a sentence can only be assigned one single label. However, such approach ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010